Skip to content

Introduces API to get inference config, removes unused inference config defaults#890

Merged
bejaeger merged 5 commits into
mainfrom
ben/remove-unused-inference-config-defaults
Apr 24, 2026
Merged

Introduces API to get inference config, removes unused inference config defaults#890
bejaeger merged 5 commits into
mainfrom
ben/remove-unused-inference-config-defaults

Conversation

@bejaeger

@bejaeger bejaeger commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@bejaeger

Copy link
Copy Markdown
Collaborator Author

This change is part of the following stack:

Change managed by git-spice.

@bejaeger bejaeger requested a review from a team as a code owner April 23, 2026 10:10
@bejaeger bejaeger requested review from priorphil and removed request for a team April 23, 2026 10:10
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@bejaeger bejaeger removed the request for review from priorphil April 23, 2026 10:15

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a get_inference_config method to both the classifier and regressor, allowing users to access the active configuration before calling fit. It also cleans up the codebase by removing unused V2.6 preprocessor configurations and presets. The review feedback recommends returning a deep copy of the configuration object to prevent accidental mutation of the estimator's internal state and suggests standardizing docstring formatting for better consistency across the documentation.

Comment thread src/tabpfn/classifier.py Outdated
Comment thread src/tabpfn/regressor.py Outdated
Comment thread src/tabpfn/regressor.py Outdated

@adrian-prior adrian-prior left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adrian-prior adrian-prior self-requested a review April 23, 2026 22:52

@adrian-prior adrian-prior left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry,
I think I previously overlooked something. Can't we now remove _get_tabpfn_v2_6_config: https://github.com/PriorLabs/TabPFN/blob/231de0c/src/tabpfn/inference_config.py#L332-L364?

@bejaeger

Copy link
Copy Markdown
Collaborator Author

Yep, should be removed with this PR?

@oscarkey oscarkey left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!
Maybe a unit test for each? 😇

@bejaeger bejaeger enabled auto-merge April 24, 2026 12:28
@bejaeger bejaeger added this pull request to the merge queue Apr 24, 2026
Merged via the queue into main with commit 0dff55b Apr 24, 2026
12 checks passed
LeoGrin added a commit to PriorLabs/tabpfn-extensions that referenced this pull request May 10, 2026
…nfig

PriorLabs/TabPFN#890 (which adds get_inference_config) is on TabPFN main
but not yet released — every released version (≤ v7.1.1) lacks the method.
For those, fall back to the historical hardcoded MAX_NUMBER_OF_CLASSES=10
when the base estimator's class lives in a tabpfn-prefixed module. Other
estimators (sklearn etc.) still get the explicit-alphabet ValueError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LeoGrin added a commit to PriorLabs/tabpfn-extensions that referenced this pull request May 10, 2026
…282)

* Auto-infer alphabet_size in ManyClassClassifier from the base estimator's checkpoint

The previous fallback (`estimator.max_num_classes_`) has been dead since the
initial commit — TabPFN core has never set that attribute on any version
(v2 / v2.5 / v2.6 / v3), so users always had to pass `alphabet_size` explicitly
or hit `ValueError`. Now resolution cascades:

  1. Explicit `alphabet_size=...` (unchanged)
  2. `estimator.inference_config_.MAX_NUMBER_OF_CLASSES` (post-fit)
  3. Probe: fit a clone on 4×2 synthetic rows to populate `inference_config_`,
     then read `MAX_NUMBER_OF_CLASSES`

The probe is cheap because TabPFN's `_load_checkpoint_cached` (`@lru_cache`)
makes the subsequent codebook fits reuse the already-loaded checkpoint — net
I/O is a single ckpt read. Verified with the v2.5, v2.6, and v3 default
checkpoints (alphabet auto-resolves to 10, 10, 160 respectively).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Log resolved alphabet_size and base-fit count in ManyClassClassifier.fit

Surfaces "Base estimator supports up to N classes; data has M — …" at
verbose=1, in both the no-mapping and codebook branches, so users can see
how the alphabet was resolved and how many base fits will follow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Strip categorical_features_indices on the probe clone

The probe fits on a synthetic 2-feature matrix; any user-provided
categorical indices ≥ 2 (e.g. categorical_features_indices=[3]) on the
base estimator would trip TabPFN's index-bounds validation before
inference_config_ gets populated. Reset them on the clone.

Surfaced by codex review on PR #282.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Restore clean ValueError when probe rejects the synthetic input

Pre-PR, a non-TabPFN base estimator with no alphabet_size always raised
the documented "alphabet_size must be specified ..." ValueError. After the
probe was added, that path could surface an arbitrary error from inside
the user's estimator instead.

Catch ValueError/TypeError around probe.fit (the typical sklearn
input-validation errors) and fall through to None so the documented
ValueError still fires. Heavier exceptions (RuntimeError, OSError, etc.)
still propagate so genuine bugs aren't masked.

Surfaced by cursor-bot review on PR #282.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix ruff D209 on _probe_alphabet_size docstring

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Use TabPFN's get_inference_config() instead of a probe fit

TabPFN exposes a public get_inference_config() method that loads the
checkpoint without fit data and returns the active InferenceConfig
(honoring any constructor override). Drop the probe fit, the synthetic
X_probe / y_probe construction, the categorical_features_indices reset,
and the (ValueError, TypeError) catch — all of it collapses into one
method call.

Suggested by adrian-prior on PR #282.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Default alphabet_size to 10 for older TabPFN without get_inference_config

PriorLabs/TabPFN#890 (which adds get_inference_config) is on TabPFN main
but not yet released — every released version (≤ v7.1.1) lacks the method.
For those, fall back to the historical hardcoded MAX_NUMBER_OF_CLASSES=10
when the base estimator's class lives in a tabpfn-prefixed module. Other
estimators (sklearn etc.) still get the explicit-alphabet ValueError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants